Data Engineering For Scaling Language Models To 128K Context